New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

refactor: Speed up Historical Backfill process #379

Merged

morgsmccauley merged 10 commits into main from 371-historical-backfill-performance-improvements

Nov 12, 2023

Collaborator

morgsmccauley commented Nov 9, 2023 •

edited

Loading

Problem

The Historical backfill is composed of two main steps:

Fetching matching blocks from near-delta-lake/Databricks index files
Manually filtering all blocks between the last_indexed_block from near-delta-lake and the block_height in which the historical process was triggered

Only after each of these steps are completed are the blocks flushed to Redis. This leads to a large delay from when the process was triggered, to when the blocks are actually executed by Runner, creating the appearance of things 'not working'.

Changes

This PR makes the following changes to reduce the time between trigger, and execution:

Flush blocks indexed blocks immediately

Fetching blocks from the index file (Step 1.), is relatively quick process. Rather than wait for (Step 2.), we can flush the blocks immediately, and then continue on to the following step.

Prefetch blocks in manual filtering

In manual filtering, blocks are processed sequentially. To speed this process up, we can fetch ahead so that we minimise the time spent waiting for S3 requests. Luckily, near-lake-framework does exactly this, therefore manual filtering has been refactored to use this.

Results

The following is from a backfill of 19,959,790 blocks run locally. I'd expect the Before to be much lower given the geographical distance compared to actual our infrastructure, but the results are still positive :).

Time till first block appears on Redis:

Before: 33 minutes
After: 2 seconds

Time to completion:

Before: 33 minutes
After: 45 seconds

morgsmccauley added 5 commits

November 9, 2023 16:30


          chore: Log on historical backfill completion

6e3d0a6


          feat: Reset Redis state prior to historical backfill

58a34f7


          feat: Flush blocks from index before continuing historical backfill

fdd08ed


          refactor: Configure AWS/Lake through environment

882da05


          chore: Update historical backfill logs

745ff21

morgsmccauley linked an issue

that may be closed by this pull request

Historical backfill performance improvements #371

Closed

morgsmccauley force-pushed the 371-historical-backfill-performance-improvements branch from e5d7fce to 60cd1dd Compare

November 10, 2023 00:00


          refactor: Use Lake Framework to process unindexed historical blocks

53649ff

morgsmccauley force-pushed the 371-historical-backfill-performance-improvements branch from 60cd1dd to 53649ff Compare

November 10, 2023 01:40

morgsmccauley changed the title ~~371 historical backfill performance improvements~~ refactor: Speed up Historical Backfill process

morgsmccauley marked this pull request as ready for review

November 10, 2023 02:49

morgsmccauley requested a review from a team as a code owner

November 10, 2023 02:49

darunrs reviewed

View reviewed changes

indexer/queryapi_coordinator/src/historical_block_processing.rs

		pub const INDEXED_ACTIONS_FILES_FOLDER: &str = "silver/accounts/action_receipt_actions/metadata";
		pub const MAX_UNINDEXED_BLOCKS_TO_PROCESS: u64 = 7200; // two hours of blocks takes ~14 minutes.

Collaborator

darunrs Nov 10, 2023

This kind of served as an alarm for if something went wrong with the index file creation process. Since we are removing this, we should have some metric or error that checks if the latest index file failed to create. It doesn't need to be here or in this PR though.

Collaborator

gabehamilton Nov 10, 2023

+1

Collaborator Author

morgsmccauley Nov 12, 2023

This was intentional, forgot to comment on it sorry.

Now that we're using near-lake-framework the BPS should be significantly faster. I'm not sure this should be a concern anymore. I'll add a warning log for now and we can add more later if this becomes a problem :)

indexer/queryapi_coordinator/src/historical_block_processing.rs

		);

		storage::del(

Collaborator

darunrs Nov 10, 2023

I think there might be a bug in the runner-side prefetch code related to this. If a historical process is kicked off and it fails, then the runner-side prefetch continuously pulls from its buffer array, not the stream (Although the stream message is still there). So, even if you delete the stream and fill it with a new historical stream, the earlier failed message will continue to block execution. I'll test this and ship a PR to fix that if it's in fact true.

Collaborator Author

morgsmccauley Nov 12, 2023

Good catch

Maybe we need to think of alternate ways of handling pre-fetch, perhaps we could:

Update/overwrite the existing stream messages rather than maintaining an in-memory queue
set an expire-able key in redis, similar to to the real-time caching

indexer/queryapi_coordinator/src/historical_block_processing.rs Outdated Show resolved Hide resolved

indexer/queryapi_coordinator/src/historical_block_processing.rs

                           }
+                          .start_block_height(last_indexed_block + 1)
+                          .build()
+                          .context("Failed to build lake config")?;

Collaborator

darunrs Nov 10, 2023

If we fail to build the lake config, then should we purge the historical feed again? I think it could be confusing to the customer if it looks like their historical process ran only to then have a gap between the block they specified and what blocks actually ran. On that front, I feel it might be better to construct this earlier. Maybe also have a retry on it depending on the error type.

Collaborator Author

morgsmccauley Nov 12, 2023

There any many things that can fail and cause the historical process to exit - it's not just limited to the construction of near-lake-framework.

At this point, I think it's worth pushing this out as is so we can test. We then can have a deeper conversation about error handling later.

indexer/queryapi_coordinator/src/s3.rs Show resolved Hide resolved

gabehamilton reviewed

View reviewed changes

indexer/queryapi_coordinator/src/historical_block_processing.rs Outdated Show resolved Hide resolved

gabehamilton approved these changes

View reviewed changes

Collaborator

gabehamilton left a comment

Commented on one thing to look into (unindexed start_block_height), otherwise looks great.

darunrs approved these changes

View reviewed changes

Collaborator

darunrs left a comment

Awesome work! Excited to see these changes in action.

morgsmccauley added 4 commits

November 13, 2023 09:04


          chore: Update historical log messages

87b09c4


          refactor: Move S3 related const to s3 module

1e6c24d


          fix: Ensure unfiltered process starts where databricks finished

0bcd8d7


          chore: Log when filtering large amounts of unindexed blocks

352c7e0

morgsmccauley merged commit 530efd3 into main

5 checks passed

morgsmccauley deleted the 371-historical-backfill-performance-improvements branch

November 12, 2023 20:59

morgsmccauley mentioned this pull request

Prod Release 20/11 #414

Merged

morgsmccauley mentioned this pull request

test stable branch git fix up #687

Closed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet